-
Notifications
You must be signed in to change notification settings - Fork 164
A proposal for SDK configurations for metric aggregation (Basic Views) #126
A proposal for SDK configurations for metric aggregation (Basic Views) #126
Conversation
@open-telemetry/specs-metrics-approvers Please take a look at this. If there are no objections to this proposal - in principle, not in the exact details, please approve, so we can merge and get to work on having a more concrete specification for it. |
Overall, this proposal looks really good. Nice and to the point 😄 I do have some concerns, however, around not providing the ability to have multiple aggregations per instrument for GA:
It seems like it would be fairly straightforward to extend this proposal to include the ability to have multiple aggregations per instrument, and I can't see any blockers for adding that in the future. Is the main reason for not doing that now to avoid implementation complexity? TLDR: I think I'd feel a little more comfortable if this proposal included a few more details about how this would likely be extended to support multi-aggregation use cases in the future, as I think it's something we would want to make sure gets included shortly after GA. Additional Thoughts / Considerations for this OTEP:
|
While I also still have OC metrics at work that utilize multi-aggregation on the same measure, what really struck me about the last comment was the "provide recommended default aggregations" line. I hadn't thought about yet how I'd convert the metrics in my grpc library to opentelemetry. Right now it exports default views (aggregations) the user can choose to subscribe to specific ones in the list, all or none: https://github.com/tsloughter/grpcbox/blob/master/src/grpcbox_oc_stats.erl#L76 Maybe multi-aggregates has to come in a separate PR because it also needs to allow specifying the labels to keep per-aggregation. |
I don't know why an instrumentation library would ever want to recommend default aggregations. How can the grpc instrumentation writer know anything about backends or how the data needs to be delivered to one? |
@james-bebbington Can you elaborate on this? I don't understand what this means, or how it applies to a single instrument in a single process.
|
The library author definitely knows what metrics and aggregations may be of interest to the user. In the link I gave it shows client and server aggregations for sizes, latencies, number of requests. OC has specs for default views for common protocols like http, grpc, sql, etc https://github.com/census-instrumentation/opencensus-specs/blob/master/stats/gRPC.md |
Available Instruments, sure, but aggregations are going to depend on the backend, aren't they? |
What do you mean by backend? Aside from a user wanting to define their own distribution buckets based on their particular services latencies I'm not sure what you mean. Do you mean like if a view is a histogram the user will want/need to specify the particular histogram impl to use? |
The aggregation generates metrics, which get sent somewhere (the telemetry backend). Those backends support different shapes of data being sent to them. Some may support histograms, some may not. Some may support MMSC summaries, some may not. In general, the instrumentation author can't possibly know where the data will end up, and hence can't tell people what aggregations to use. In my mental model, instrumentation uses Instruments with from the API and gives them meaningful names (generally based on semantic conventions). Those instruments have default aggregations assigned to them. If the operator's telemetry backend doesn't want the defaults, then the operator should use this proposed API to change those aggregations. |
Oh, I took it as up to the exporter or collector to deal with metrics in shapes that backends can't handle. Might not be perfect and still entail a user to override a default to use X instead of Y but that in the end the intention was to not have, "what aggregations does my backend support", be a regular thing the user should have to think about when defining an aggregation. |
But, an exporter can't turn a MMSC summary into a histogram, or a sum into a set of quantiles, right? In addition to capabilities, the operator might say "it's really important to get lots of details on metric xyz, and I know my backend supports quantiles, so I'll configure that aggregation, with these settings". Again, this isn't something that an instrumentation author can have any knowledge of. |
Right, though I don't like having that in the first place :)
True, but the instrumentation author does know what are likely useful aggregations, such as a distribution of latencies per method/route, and can make this simple for the user to enable if Otel provides the right tools. |
I don't know if I should propose a separate OTEP or what but I'm realizing now I think instruments and aggregations are too tied together. Thinking about how a view would just be a name, description, aggregation, label list and instrument name I realized this it also needs to be defined as where the checkpoint is and not the instrument. My proposal would make this change as well as removal any aggregation from the instrument definition. Instead, the instrument kind informs the SDK of what default aggregation to use for creating a default view per instrument. Recording a measurement to an instrument updates each view that is bound to it (preferably which is also subscribed to an exporter, but that can be an optimization for later), the accumulator checkpoints each view and the integrator does any necessary merging -- not sure the integrator and merging is actually necessary if the recording to a view does the dropping of labels itself? So if you create a counter and have the default SDK installed it will create a view by the same name that keeps all the labels in each recording and uses a sum aggregation. Hm, thought of another reason I find changing the aggregation on a counter funny. An |
@tsloughter So...it sounds like you are not willing to approve this proposal, then? I personally don't think we should try to re-do our instrument model before GA. |
I'm leaning that way, yea. But, my review also doesn't count towards the 4 approvals anyway :) And yea, I don't know if anyone else would get behind my tweaks to the model. It would keep the same API in place, I think, just the SDK would change to describe updating the view's value and checkpointing the view, instead of active instrument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the "mini-Views API".
Looks very good. left couple of small comments, non-blocking.
Re the question around instrumentation plugins needing to provide default aggregations, here's an example from OC to hopefully explain my thoughts. A typical plugin seems to:
That allows you to capture more data than most people need but not expose it by default. If a user, for example, wants to use your HTTP instrumentation library and include "path" info, they just define their own view. Even though we don't yet have automatic label injection in OT (presumably this is coming?), I think the same still applies. You may want to write a plugin that can capture a lot of dimensions, but only export some by default - can we support that case? Btw, I wasn't a contributor on OpenCensus so would be great if someone with more details could jump in 😄 |
I believe selecting and configuring labels to inject from the distributed context should be an eventual goal of the Views API. This is something OpenCensus did support, but I was not directly involved. |
Co-authored-by: Chris Kleinknecht <[email protected]>
|
||
What are some future changes that this proposal would enable? | ||
|
||
- A full-blown views API, which would allow multiple "views" per instrument. It's unclear how an exporter would specify which one it wanted, or if it would all the generated metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a rough prototype of the Views API that just got merged for Python. Right now the ability for multiple views per instrument is possible. Each view can specify the aggregation type, default set of label keys to aggregate by and aggregator configuration (histogram buckets and such). Something that hasn't been discussed yet is the temporality of the instruments (which is configured on the batcher and an SDK property, so currently all instruments will have the same behaviour). I'm not sure if there is need of new mechanisms exposed to the user to construct these configurations when Views achieve something similar (maybe they will turn into the Views API eventually anyways?).
Also, to maybe open up the floor for addressing open questions 3 and 4, could it possible to have the views handle all of the instrument related configuration that the exporter doesn't need to specify, and just have the temporality be controlled by the SDK (since exporters need to know about it). So for 3, the granularity of control for those configuration fields would be per instrument, and the granularity for temporality would be via regex. Although it might be too complex/too much configuration for a user to have to do. Maybe there is some way we can have the Views API handle all of the configuration (although then the exporter would have to know about it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not discuss a full-blown Views API in this OTEP. This OTEP is intentionally much more limited in scope, as I believe it is more achievable cross-language to be delivered in the GA timeframe.
Details for the open questions should be resolved in the specification PR. I'd like general consensus on an approach, via OTEP before we dive into the details, which should be done in the specification.
@open-telemetry/specs-metrics-approvers Please review if you have not already approved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I like the approach to reduce the scope of a full View API support, but it feels that if we don't start thinking in that direction will be very hard to support that in the future.
I would like us to start using the right terminologies and API names, with a limited capability.
Would like to see this merged with #128 |
- Example: select all ValueRecorders whose name ends with ".duration". | ||
* View - configures how the batching and aggregation should be done. | ||
- 3 things can be specified: The aggregation (Sum, MinMaxSumCount, Histogram, etc), the "temporality" of the batching, | ||
and a set of pre-defined labels to consider as the subset to be used for aggregations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have in my mind this optimization that I want you to not make me hard to achieve when the full views is approved. There needs to be 2 lists of labels (labels from the instruments record calls, labels from correlation context). So probably I would just suggest to make sure we clearly specify that these are labels from the record calls.
Another suggestion that I have is that later (most likely) we want to do label rename (or do we?) maybe not.
So maybe a design like:
Labels: List. Where then in the LabelKey we can have a source property, as well as maybe an "exported name" or other things, may not be a bad idea.
Right now we can start with only one string being in the LabelKey but later we can build on top of that.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the labels part of this is definitely the most fuzzy in my head. I'm not 100% sure of what is needed at this level. I was thinking initially of just having the configuration contain the set of labels to reduce down to, and not consider renaming or re-mapping or anything fancy like that.
So, assuming the instrumentation author has provided labels [a,b,c,d,e] the view might be configured just with [b,e].
MeterSdkProvider meterProvider = OpenTelemetrySdk.getMeterProvider(); | ||
|
||
// create a selector to select which instruments to customize: | ||
InstrumentSelector instrumentSelector = InstrumentSelector.newBuilder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably at least the "meter name" must be provided. This may get into too many details that we can cover in the specs, if that is the intention maybe just remind this :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that you could provide just the instrument type, and get all of them (as a way to set the default).
- Use a Histogram aggregation, and only use two labels "route" and "error" for aggregations. | ||
- Use a quantile aggregation, and drop all labels when aggregating. | ||
|
||
In this proposal, there is only one configuration associated with each selector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean if two selectors match the same instrument only one View is active?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, with some sort of precedence to be determined later. (see my java prototype for an example).
|
||
Add support to the default SDK for the ability to configure Metric Aggregations. | ||
|
||
## Motivation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see also another config, simpler:
- Change default temporality for instrument type - separated than view or selector.
- Enable/Disable an instrument default aggregation.
Maybe this are out of scope, but kind of think that they may be configured separately than the views.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that the "disable" use-case could just be done by setting the aggregation to a no-op aggregator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worried that having this "only one" view requirement and relying on that to enable/disable default aggregations, or even doing some merging of "Views" may be too tricky and too restrictive for the future.
I would like you to think how would you do this if full views API is supported.
Bogdan self brainstorming:
Maybe having a way to retrieve all the configured views (InstrumentDescriptor + View) and allow:
- "Delete" - the user will use this to disable default aggregation for example
- You can clone/change the retrieved "View" after deleting it and set it back to update the "temporality"
It seems a lot of work to just disable "default aggregations". Maybe treat "default views" a bit special, and offer specific APIs to manipulate "all of them" like:
- "default views temporality" - these APIs may include selectors to facilitate.
- "disable all default views"
But this way we make it clear that these "DefaultViewsConfiguration" API refers to only the default views and you no longer need the "one view only" requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way is to have a "strategy" param to the API: replace all (only supported option now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of considering this OTEP the "default views configuration". Maybe we can just simplify it down to just selecting by instrument type for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can be convinced that labels can be also an operation available for default views, but probably we can discuss that later. Let's start with something then incremently add more stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also if for selector only type is important now, we can start only with that. But these decisions I think can be made in the specs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with leaving details to the specs. I'd be happy to drop label support for now, to keep things ultra-simple and build from there. Do you want this OTEP updated to reflect that, or is that something we can wrangle out in the specs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a sentence that say details still to be polished in specs is more than enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few more open questions to capture the discussion, and a note that we will resolve them in the spec. Sound ok?
Spoke with @bogdandrutu offline. I think his comments above alleviate most of the concerns I had around supporting multiple views in the future. The only question not yet addressed was around supporting instrumentation of libraries that want to provide default "default views" (i.e. default aggregations / label sets) along with the instrumented metrics. There are a couple of possible options for this:
|
I don't want this OTEP to address the issue of API-level views. That is a separate concern, and I don't want to muddy the waters at this point. |
Fantastic question. 2 options that I see.
I'm totally happy either way. |
I'm fine with either of those as well. Which one is likely unimportant since it is the actual spec that developers looking to implement this will be relying on. |
1. Should custom aggregations should be allowable for all instruments? How should an SDK respond to a request for a non-supported aggregation? | ||
2. Should the requesting of DELTA vs. CUMULATIVE be only available via an exporter-only API, rather than generally available to all operators? | ||
3. Is regex-based name matching too broad and dangerous? Would the alternative (having to know the exact name of all instruments to configure) be too onerous? | ||
4. Is there anything in this proposal that would make implementing a full Views API (i.e. having multiple, named aggregations per instrument) difficult? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this question needs to be answered before this is merged. If the answer is "yes" this could be moving us away from our goal instead of towards it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if anyone thinks that this proposal will stop views from happening, please speak up now, so we can resolve it ASAP.
Decision during the metrics SIG was to merge this OTEP as a starting point and #128 will be changed to build on top of this. |
open-telemetry#126) * Add a proposal for SDK configurations for metric aggregation. * rename file to match the PR # * fix markdown lint issues * clarify that this applies to the default SDK, and fill out the open questions * update the java example to fix the naming changes * Add another open question to the list. * Update text/0126-Configurable-Metric-Aggregations.md Co-authored-by: Chris Kleinknecht <[email protected]> * Update to use the 'view' terminology * another configuration/view replacement * Add a few more open questions, and a note that they will be resolved in the spec. * Update text/0126-Configurable-Metric-Aggregations.md Co-authored-by: Tyler Yahn <[email protected]> Co-authored-by: Chris Kleinknecht <[email protected]> Co-authored-by: Tyler Yahn <[email protected]> Co-authored-by: Bogdan Drutu <[email protected]>
open-telemetry#126) * Add a proposal for SDK configurations for metric aggregation. * rename file to match the PR # * fix markdown lint issues * clarify that this applies to the default SDK, and fill out the open questions * update the java example to fix the naming changes * Add another open question to the list. * Update text/0126-Configurable-Metric-Aggregations.md Co-authored-by: Chris Kleinknecht <[email protected]> * Update to use the 'view' terminology * another configuration/view replacement * Add a few more open questions, and a note that they will be resolved in the spec. * Update text/0126-Configurable-Metric-Aggregations.md Co-authored-by: Tyler Yahn <[email protected]> Co-authored-by: Chris Kleinknecht <[email protected]> Co-authored-by: Tyler Yahn <[email protected]> Co-authored-by: Bogdan Drutu <[email protected]>
open-telemetry#126) * Add a proposal for SDK configurations for metric aggregation. * rename file to match the PR # * fix markdown lint issues * clarify that this applies to the default SDK, and fill out the open questions * update the java example to fix the naming changes * Add another open question to the list. * Update text/0126-Configurable-Metric-Aggregations.md Co-authored-by: Chris Kleinknecht <[email protected]> * Update to use the 'view' terminology * another configuration/view replacement * Add a few more open questions, and a note that they will be resolved in the spec. * Update text/0126-Configurable-Metric-Aggregations.md Co-authored-by: Tyler Yahn <[email protected]> Co-authored-by: Chris Kleinknecht <[email protected]> Co-authored-by: Tyler Yahn <[email protected]> Co-authored-by: Bogdan Drutu <[email protected]>
open-telemetry/oteps#126) * Add a proposal for SDK configurations for metric aggregation. * rename file to match the PR # * fix markdown lint issues * clarify that this applies to the default SDK, and fill out the open questions * update the java example to fix the naming changes * Add another open question to the list. * Update text/0126-Configurable-Metric-Aggregations.md Co-authored-by: Chris Kleinknecht <[email protected]> * Update to use the 'view' terminology * another configuration/view replacement * Add a few more open questions, and a note that they will be resolved in the spec. * Update text/0126-Configurable-Metric-Aggregations.md Co-authored-by: Tyler Yahn <[email protected]> Co-authored-by: Chris Kleinknecht <[email protected]> Co-authored-by: Tyler Yahn <[email protected]> Co-authored-by: Bogdan Drutu <[email protected]>
This proposal is less than a full-blown Views API. The goal is just to give some flexibility to operators and exporter authors around how metrics should be aggregated by the SDK.